Fix TokenTextSplitter for punctuation marks handling #5005

ilayaperumalg · 2025-12-02T13:17:55Z

The TokenTextSplitter was incorrectly splitting small text into multiple chunks when punctuation marks were present, even when the entire text was well below the configured chunk size.
Only apply punctuation-based truncation when the remaining tokens exceed the chunk size (tokens.size() > chunkSize). This ensures small texts remain as single chunks while preserving correct splitting behavior for larger texts.

- The TokenTextSplitter was incorrectly splitting small text into multiple chunks when punctuation marks were present, even when the entire text was well below the configured chunk size. - Only apply punctuation-based truncation when the remaining tokens exceed the chunk size (tokens.size() > chunkSize). This ensures small texts remain as single chunks while preserving correct splitting behavior for larger texts. Fixes spring-projects#4981 Signed-off-by: Ilayaperumal Gopinathan <ilayaperumal.gopinathan@broadcom.com>

markpollack · 2025-12-03T23:01:11Z

merged

main: e065965
1.1.x: 8cc4ea4

ilayaperumalg added this to the 2.0.0.M1 milestone Dec 2, 2025

ilayaperumalg added for: backport-to-1.1.x bug Something isn't working labels Dec 2, 2025

ilayaperumalg assigned markpollack Dec 2, 2025

markpollack closed this Dec 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix TokenTextSplitter for punctuation marks handling #5005

Fix TokenTextSplitter for punctuation marks handling #5005

Uh oh!

ilayaperumalg commented Dec 2, 2025

Uh oh!

markpollack commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix TokenTextSplitter for punctuation marks handling #5005

Fix TokenTextSplitter for punctuation marks handling #5005

Uh oh!

Conversation

ilayaperumalg commented Dec 2, 2025

Uh oh!

markpollack commented Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants